Add buf as an alternate Python protobuf code generator#23343
Conversation
|
Thanks for the PR! Recently the maintainers have been discussing that new backends should first start out as published plugins in a user's repo, so that the community can mess around and flesh out the APIs and utility, before it gets merged into Pants directly and becomes more relied on. I was just commenting on another PR yesterday that I need to add docs to this effect. A couple simple examples of that are here: This would also let us extract bits and pieces at a time that we know are more API stable (e.g. one there are some lint users who are happy, then pull that in, etc). |
|
Thanks for the quick response @sureshjoshi! In this case, This PR adds on the Would it still make sense to start as a plug in in this case? |
Ahh, okay okay, I thought I'll just point out that this will take some time to review, and just at a glance, it feels like a lot of code for what it's doing - which is common for Claude driven PRs, FYI. So, thanks for the PR, and please have some patience on the review. |
|
Thanks @sureshjoshi! Quite a few of the changes are just mechanical -- moving files related to buf linting/formatting that were under a subdirectory that wouldn't make sense if buf also does code generation. I could split those into a separate PR but I didn't since there wouldn't be a justification for moving those files without adding the new functionality. Let me know if you have a preference. I can also just put them in a different commit within this PR if it makes review easier. |
|
This would be awesome to have. I'll check this out and give it a try on our codebase. |
Several standalone smaller, simpler PRs will always make for an easier review :) |
Sounds good. I'll make a couple more PRs and then rebase to simplify this one. Here's the first to bump the |
Background
buf is the modern toolchain for Protocol Buffers, replacing
ad-hoc
protocinvocations with a coherent ecosystem:buf.yaml/buf.lockdeclare and pin proto module dependencies (theBuf Schema Registry — BSR — analogue to a package index).
buf.gen.yamlcentrally configures codegen plugins. Plugins can beremote:(hosted on the BSR, fetched with version+revision pins) orprotoc_builtin:/local:(plain protoc-style binaries on PATH).protovalidate are first-class buf citizens —
they're not really usable through the existing protoc backend without
out-of-graph workarounds.
For Pants users, the existing
pants.backend.codegen.protobuf.pythonbackendalready supports
buf format/buf lint, but codegen isprotoc-driven.This PR adds the codegen half: a
protobuf_generator='buf'target field thatopts a
protobuf_sourceinto running throughbuf generateinstead ofprotoc. Theprotocpath is unchanged when the field is left at itsdefault.
Concrete benefits Pants users get:
connectrpcandprotovalidateruntime deps automatically based on whichplugins appear in
buf.gen.yaml.buf.lockis the single source of truth for BSR commits.pants generate-lockfilesregenerates it viabuf dep update. Codegen isblocked early with a fix-instruction error if
buf.yamldeclaresdeps:without a sibling
buf.lock, so reproducibility isn't optional.binary tools elsewhere.
remote:plugins must declareversion+revision; Pants's built-inDEFAULT_PLUGIN_PINSsynthesizes pins forpopular plugins so users don't have to look them up by hand.
buf_gen_templatefield, soindividual proto targets can use a different
buf.gen.yamlwithout forkingthe global config.
example-buf is a full working
reference repo with
buf.yaml+buf.lock+buf.gen.yaml, a PythonConnectRPC server, validation via protovalidate, and a TypeScript client that
shares the same
idl/andbuf.lock.Usage
Opting in is one field on the proto target:
With a typical layout:
What changes for the user, end-to-end:
pants export-codegen ::now runsbuf generateforbuftargets andprotocfor everything else. The two paths produce the same kinds ofartifacts (
*_pb2.py,*_pb2_grpc.py, etc.), so downstream targets don'tneed to know which generator was used.
buf.gen.yamlto learn which suffixes (_pb2,_pb2_grpc,_grpc,_connect) the configured plugins emit, and registersthe corresponding modules per proto. New: BSR-dep-provided modules
(
buf.validate.validate_pb2,google.protobuf.timestamp_pb2, etc.) arealso registered, gated on
include_imports: truebeing set on theprotocolbuffers/pythonplugin (so we only claim ownership when the fileis actually generated).
buf.gen.yamlplugin presence ratherthan the existing
grpc=Truefield. Aconnectrpc/pythonplugin produces_connect.pyfiles, so we infer theconnectrpcPyPI package; agrpc/pythonplugin →grpcio; agrpclib_pythonlocal plugin →grpclib. The protoc branch'sgrpc=True-driven inference is unchangedfor non-buf targets.
pants generate-lockfilesnow also resolvesbuf.yamldeps. Eachbuf.yamlis a resolve named after its parent dir (buffor repo-root).pants generate-lockfiles --resolve=bufrunsbuf dep updatein asandbox.
New options:
[buf].extra_plugin_pins— forremote:plugins not inDEFAULT_PLUGIN_PINS.[python-protobuf].extra_buf_plugin_suffixes— for custom or forkedplugins that emit
_pb2/ etc. modules.[python-protobuf].extra_buf_bsr_modules— for BSR deps not inDEFAULT_BSR_DEP_MODULES.All three follow the same "registry + extras" pattern as
DEFAULT_MODULE_MAPPING/extra_module_mapping.A worked example of all of the above is in
example-buf — clone it next to
your
pantscheckout and run./pants_from_sourcesto see Pants resolvebuf.lock, run buf-driven codegen, infer
connectrpc+protovalidateruntime deps, and execute pytest tests against the validator interceptor.
Code design
New module:
src/python/pants/backend/codegen/protobuf/buf/. Houseslanguage-agnostic helpers (yaml parsing, plugin-id matching, pin synthesis,
lockfile rules) so future Go / JVM / etc. buf integrations don't have to
re-implement the plumbing.
buf/config.py— yaml parsing (parse_buf_yaml_module_paths,parse_buf_yaml_deps,parse_plugin_outs,python_pb2_include_imports); pin synthesis(
synthesize_pinned_buf_gen_yaml+DEFAULT_PLUGIN_PINS); module-rootresolution; per-target template-request resolvers; async digest fetchers
(
fetch_buf_layout,fetch_buf_gen_contents). Organized into clearsections (
# ---- buf.yaml parsers ----, etc.) for top-down readability.buf/subsystem.py— moved up fromlint/buf/since codegen needs ittoo. Adds
extra_plugin_pinsoption.buf/fields.py—BufGenTemplateFieldplugin field, registered onProtobufSourceTarget/ProtobufSourcesGeneratorTarget.buf/lockfile.py—KnownBufResolveNamesRequest/RequestedBufResolveNames/GenerateBufLockfileplus their rules,hooking into the standard
pants generate-lockfilesmachinery.Python-specific bits:
python/buf_rules.py— thegenerate_python_from_protobuf_via_bufrule. Bundles
buf+protocinto the sandbox (protocsoprotoc_builtin:/local:plugins resolve), synthesizes a fully-pinnedbuf.gen.yaml, runsbuf generate, and returns the digest. Pre-checksfor
buf.lockwhenbuf.yamldeclaresdeps:, raisingMissingBufLockErrorwith apants generate-lockfiles --resolve=...pointer.
python/python_protobuf_subsystem.py— addsDEFAULT_PLUGIN_SUFFIXES,DEFAULT_BSR_DEP_MODULES, theextra_buf_*options, and the buf branchof runtime-dep inference. Subsystem booleans (
grpcio_plugin,mypy_plugin, etc.) and thegrpc=Truefield are warned-but-ignored onthe buf path.
python/python_protobuf_module_mapper.py— extended to registerper-proto modules from
buf.gen.yaml+ per-buf-module BSR-dep modules.Cache-isolation correctness:
buf_rules.pybuilds the input digest fromtransitive_targets.closureof the target's address. A monorepo with manyunrelated protos in the same buf module sends only the per-target closure
into the sandbox. Verified by
test_buf_only_sends_transitive_closure_to_sandbox(the test plants a malformed sibling proto; codegen succeeding for the unrelated
target proves the malformed file was filtered out).
Backward compatibility: the
protobuf_generatorfield defaults toprotoc,so existing
protobuf_sources(...)declarations are unaffected. The protoccodegen rule, mapper rule, and runtime-dep inference all branch on the field
value before doing anything new.
Tests cover (buf-only — no protoc-side test additions):
include_imports-gating(unit tests in
buf/config_test.py)buf/lockfile_test.py)extra_buf_bsr_modules(python/python_protobuf_module_mapper_test.py)(
python/python_protobuf_subsystem_test.py)plugin via BSR network fetch, missing
buf.lockerror, sandboxclosure-isolation including same-BUILD-file scenarios
(
python/buf_rules_integration_test.py)Related issues
addressed: this PR ships the Python half (ConnectRPC + protovalidate via
buf generate). The language-agnostic helpers inbuf/config.py/buf/lockfile.py/buf/subsystem.pyare the foundation a Go backendwould slot into.
Completes the codegen half of the original buf integration request. The
lint/format half shipped years ago.
opts into
protobuf_generator='buf', plugin selection lives entirely inbuf.gen.yaml— exactly the flexibility the OP asked for.LLM assistance: Code primarily written by Claude Code, but I iterated on
the design with Claude. (Per the
contribution guide.)